[Codex] Add handling for Conversational RAG to Validator API by ulya-tkch · Pull Request #84 · cleanlab/cleanlab-codex

ulya-tkch · 2025-05-30T22:05:10Z

No description provided.

jwmueller · 2025-05-30T23:52:37Z

tests/internal/test_validator.py

    assert all(scores[k]["score"] == raw_scores[k]["score"] for k in raw_scores)
+
+
+def test_prompt_tlm_with_message_history() -> None:


add test to confirm there is no query rewriting happening, whenever this is the first user message

add test to confirm that the primary TrustworthyRAG.score(prompt, response) call happens with prompt reflecting the full chat history, not with prompt reflecting the rewritten query.

Confirm you are using this TLM utils method:
cleanlab/cleanlab-tlm@a479e32

to turn the chat history into a prompt string.

src/cleanlab_codex/internal/validator.py

elisno · 2025-06-03T03:11:51Z

src/cleanlab_codex/internal/validator.py

    return "other_issues"
+
+
+def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:


I think this name validate_messages should be more carefully chosen when the entire validator module reserves the name method validate in Validator for looking at the trustworthiness & Eval scores.

I'd bet we wouldn't change the Validator.validate api, but we could find a different name for validate_messages since it behaves quite differently.

Consider having validate_messages take messages as a required (positional argument):

Suggested change

def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:

def validate_messages(messages: list[dict[str, Any]]) -> None:

Everywhere it's being called, it takes in a messages argument.
The caller already sets a default value for that argument, so I'd advise against setting default values in two function signatures.

src/cleanlab_codex/internal/validator.py

elisno · 2025-06-03T03:26:17Z

src/cleanlab_codex/validator.py

        codex_answer, _ = self._project.query(question=query, metadata=metadata)
        return codex_answer

+    def _maybe_rewrite_query(self, *, query: str, messages: list[dict[str, Any]]) -> str:


This _maybe... prefix implies that we might get something different from the method, other than a string. Should the check for self._tlm be done by the caller?

the maybe is supposed to suggest we might re-write the query or not

src/cleanlab_codex/internal/validator.py

ulya-tkch · 2025-06-09T21:46:13Z

closed because conversational capability moved to the backend

add to validator

4ae1897

ulya-tkch requested a review from elisno May 30, 2025 22:06

ulya-tkch added 4 commits May 30, 2025 15:27

add tlm key check

dd91b4c

add tlm key check test

8e977bf

fix type

ebfefd7

fix tests

e75d38c

jwmueller reviewed May 30, 2025

View reviewed changes

fix tests

fc8dc9d

elisno reviewed Jun 3, 2025

View reviewed changes

jwmueller reviewed Jun 3, 2025

View reviewed changes

src/cleanlab_codex/internal/validator.py Show resolved Hide resolved

ulya-tkch closed this Jun 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codex] Add handling for Conversational RAG to Validator API#84

[Codex] Add handling for Conversational RAG to Validator API#84
ulya-tkch wants to merge 6 commits intomainfrom
uly-add-conversational

ulya-tkch commented May 30, 2025

Uh oh!

jwmueller May 30, 2025

Uh oh!

jwmueller May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elisno Jun 3, 2025 •

edited

Loading

Uh oh!

elisno Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

elisno Jun 3, 2025

Uh oh!

ulya-tkch Jun 9, 2025

Uh oh!

Uh oh!

ulya-tkch commented Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert all(scores[k]["score"] == raw_scores[k]["score"] for k in raw_scores)


		def test_prompt_tlm_with_message_history() -> None:

		return "other_issues"


		def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:

	def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:
	def validate_messages(messages: list[dict[str, Any]]) -> None:

Conversation

ulya-tkch commented May 30, 2025

Uh oh!

jwmueller May 30, 2025

Choose a reason for hiding this comment

Uh oh!

jwmueller May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elisno Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elisno Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elisno Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

ulya-tkch Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ulya-tkch commented Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elisno Jun 3, 2025 •

edited

Loading

elisno Jun 3, 2025 •

edited

Loading